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Abstract: In a recent paper "The capacity of the Hopfield model, J. Feng and B. Tirozzi claim to 
prove rigorous results on the storage capacity that are in conflict with the predictions of the replica 
approach. We show that their results are in error and that their approach, even when the worst 
mistakes are corrected, is not giving any mathematically rigorous results. 
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The paper [FT] by Feng and Tirrozi addresses the interesting question of the storage capacity 
of the Hopfield model. This value, namely the maximal ratio between the number of patterns to 
the number of neurons for which the Hopfield model works as a memory has been first observed 
numerically by Hopfield to be about 0.14, and a value close to that was obtained analytically 
by Amit et al [AGS] with the use of the replica trick. Refined estimates using replica symmetry 
breaking schemes were obtained later. However, the replica trick is mathematically not rigorous, 
and there have been many attempts to obtain such results in a mathematically rigorous way. The 
best results in this respect so far were rigorous lower bounds on a c by Newman [N] which recently 
have been improved by Loukianova [LI] and Talagrand [Ta]. These bounds are still by at least 50 
per cent off the expected value. Obtaining upper bounds on a c has proven, a much more difficult 
issue, and the only result to our knowledge was obtained very recently by Loukianova [L2], who 
proved that for any a > 0, the minimum of the Hamiltonian is at some finite distance away from 
the patterns, and that as a tends to infinity, this distance tends to at least 0.05. Although her 
proof is very nice and interesting, the numerical values are of course far from satisfactory. 

The main result of Feng and Tirozzi claim to prove "rigorously" is that, if a fraction 5 of errors 
in the retrieval is allowed, then the critical a = a(5) is given by a(S) = ■ 

This result appears obviously false, since it gives a(S) close to one if 5 is chosen close to 0, 
and a(S) close to zero, if 5 is close to 1/2, contrary to what has to be the case. One might be first 
tempted to believe that this formula for a(5) is a misprint, but it is repeated consistently in the 
paper, and moreover, based on this formula, the authors argue that "the replica trick approach to 
the capacity of the Hopfield model is only valid in the case a at — > 0(7V — > oo)". 

Given the very strong and surprising claims made in this paper, it appears worthwhile to 
analyze their "rigorous proof" in some detail in order to avoid misconceptions. 

Feng and Tiriozzi study the fixpoints of a deterministic gradient dynamic of the Hopfield model, 
i.e. solutions of the equation 



Since they are interested in solutions "near" one, say the first, pattern, it is reasonable to index 
the configurations a by the set B C {1, . . . , N} on which they differ from i.e. set 1 
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I regret to have to introduce some notation that is different from that of Feng and Tirozzi. 
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The fixpoint equation (1.1) can then be written in the form 



^ f Jf E E «W ) * for all 1 < , < iV (1.3) 



After some elementary algebra, we can rewrite this as 
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where |5| denotes the cardinality of the set 5. Let us define the random variables 2 
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Then (1.4) can be written as 

Zi(N,B) > -(1 -2|£|/JV), ifi^B 
Zi(N,B) < -(1 -2|B|/JV), ifi€B 

The arguments of [FT] relies basically on their observation in Eq. (4) that for any fixed B, the 
random variables Zi(N, B) converge in distribution to —y/aQ 3 (assuming that p{N)/N tends to a), 
"by the central limit theorem", where the Q are i.i.d. standard normal variables. The remainder 
of their analysis is then based on the study of the distribution of the fc-th maxima of i.i.d. gaussian 
random variables. 

This procedure involves an interchange of limits that is not justified. From the CLT, one 
obtains the convergence of the variables in (1.1) in the sense that for a given B, for any fixed, finite 
set I of indices, the family {Zi(N, B)} ie i converges in distribution to a family of i.i.d. gaussians. 
This does not imply that e.g. max^ 1 Zi(N,B) converges, e.g. in distribution to the same limit 
as max^j \/aQl Maxima are not continuous functions w.r.t. the product topology, and therefore 
convergence in distribution and taking of maxima cannot be interchanged. (Take the following 
example: Let Xi(N) = Q, if i < N, and Xjy(N) = N. This family converges to i.i.d. standard 
gaussians, as iV tends to infinity, but the maximum (which is always N) is totally different from 
the maximum of iV i.i.d. gaussians, which behaves like \/lniV!). 

One should keep in mind that it is precisely this difficulty that has prevented reasonable upper 
bounds on a c in the past. Loukianova, for instance uses a very clever idea of "negative association" 
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In [FT] these are defined in (4), and given the strange name g(N,p(N)), which makes reference neither to their 



dependence on the index i nor the set B. We need these attributes to make meaningful statements later 

The minus sign appearing here is rather unconventional, given that Ci an d — Ci have the same distribution. 

However, it plays a role in the course of the mistakes they make later. 
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to compare the dependent variables Zi(N,B) to independent ones, but for this a price had to be 
paid that prevented sharp estimates. 

While at this point it is clear that the arguments in [FT] are in no way "rigorous", it may be 
still interesting to follow the sequel of their arguments in some more detail. Let us first consider 
the case of "perfect retrieval", Subsection 3.1. Here B is the empty set and their result relies on the 
assumption that the distribution of the maximum over i of the Zi(N,$) converges to that of i.i.d. 
gaussians, which is not justified. There is no surprise in the fact that they obtain the same result as 
MacEliece et al. [MPRV], because the heuristic arguments of [MPRV] (which do not claim to give 
a rigorous proof!) are identical to those put forward here. I should stress that to my knowledge 
there is no rigorous proof of the "if and only if" statement. 

In subsection 3.2 [FT] study the case of non-perfect retrieval, i.e. they look for the critical 
a (5) such that a fixpoint a B will exist with \B\ = 5N. The way they seem to argue is as follows: 
Zi(N,B)/y/a converges to family of i.i.d. gaussian r.v.'s. When what is the fraction of N i.i.d. 
gaussians that is larger than x? If this number is g(x), then g(—(l — 2S)/\/a) of the Zi(N, B) will be 
larger than (1 — 25), and so we will find a set B of size SN precisely when (1 — 5) = g{— (1 — 25) / \fct)\ 
At this point the authors claim (see the first phrase on page 3386, and figure 1) that the [xiV]-largest 
of the N gaussians Q is of order x, so that [xN] of them would be smaller than x, i.e. [xN] of the 
—\fo.C,i would be larger than —^fax which means that they take g(x) = —x. It does not become 
clear where they draw this claim but it is clear that it is totally wrong and leads to their absurd 
main result. 

One may ask whether their arguments can be improved. First, how can we compute the size 
of the set of indices for which Zi(N, B) > x? Obviously, one would want to study the quantity 

1 N 

GtffoB^-^I^B)^ (1.7) 

i=l 

which is nothing but (one minus) the distribution function of the empirical measure of the variables 
Zi(N, B). Now, if we replaced all the Zi(N, B) by \fad (I take the freedom to drop their pointless 
minus sign) , then by the strong law of large numbers 




/ 2(51 

This would then yield the somewhat more reasonable looking result a(5) = ^-i^_s)) 2 • is that 
result to be trusted? First, the CLT again cannot justify the passage to the gaussians, because just 
as the maxima, the empirical measure is not a continuous function. If one is somewhat optimistic, 
one may hope to prove convergence of Gjq(x,B) to a gaussian distribution function, for fixed B. 
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(E.g. Talagrand [Ta] proves this for the case B = 0). But even that would by no means imply- 
that Zi(N,B) itself is getting independent of B, as N increases, and that the set B'(B) for which 
Zi(N,B) > —(1 — 25) would coincide with B. One might want to argue that there should be a 
set B for which B'(B) = B, but then there is no reason why for this random set the convergence 
of the empirical measure should hold. To make such a statement, one would at least have to get 
control on the convergence of Gn{x,B) uniformly in the different possible sets B, i.e. we should 
need some estimate like 



IP 



sup \\G N (x, B) — <J>(ar)|| > e 

BC{1,...,JV} 
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(1.9) 



for all e > 0, for some norm || • ||. It does not seem likely that such a result is true, let alone that 
it can be proven. Note that the main difficulty here is that the number of sets B is exponentially 
large, and precisely this fact is responsible for the relatively poor lower bounds on a c that exist in 
the literature. 

In conclusion, the paper by Feng and Tirozzi has unfortunately not contributed to progress 
in the mathematical understanding of this interesting and challenging problem. Even if the most 
obvious mistakes are corrected there remain fundamental problems in the basic approach, and even 
the improved prediction on a(S) is no more rigorous and rather less convincing than the predictions 
of the replica approach. 
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